Ludwig Library is here for you: write amazing texts and boost your English skills

Ludwig Library is the amazing tool you were looking for to boost your writing English skills. Thanks to Ludwig Library, you can create a personal database of English texts to search for the sentences you really need.

Ludwig provides you with the possibility of uploading your favorite pieces of writing in PDF format — such as monographs or scientific articles belonging to a very specific field — so that, for example, you can look for technical jargon and verify how a specific word is used in context by taking inspiration from the authors you consider most authoritative.

Ludwig Library will allow you to build a personal data set of reliable sources in order to customize your search experience by creating your own library and consulting it at will.

Of course, do not worry: this personal database will always be just yours: you will be the only one able to see and consult it!

Write Better English: Go Premium!

The idea behind Ludwig

When we thought about creating Ludwig and building the most reliable linguistic search engine available online thanks to a huge database of correct and reliable sentences, we were faced with one of the most interesting problems in linguistics: how many sentences can there be in a language? Are they infinite? How many sentences do we need to collect and catalog in order to have a meaningful representation of the English language? Languages play on the edge of infinity, and this is a huge problem, philosophically, practically, and computationally.

Languages are finite because they are made up of a finite number of terms defined by the dictionary (which grows as new words are added all the time), but which at any given moment is a closed set made up of a certain number of words. Then there are the rules that allow us to combine these words in sentences that are sound.

Combining words randomly leads indeed to ungrammatical or nonsensical sentences and nonsensical sentences lead to the Dark Side.

As N. Chomsky (who, by the way, I suspect is closely related to Yoda) highlighted, "The core property of human language, and one of its most distinctive properties is the use of finite means to express an unlimited array of thoughts".

However, thanks to grammar — which is finite — we can theoretically generate an infinite number of sentences in a combinatorial way, and this is where word sense or semantics comes into play. Indeed, we can have perfectly grammatical (syntactically well-formed) sentences, but completely nonsensical ones. See the famous example provided by Chomsky himself: "Colorless green ideas sleep furiously" (Chomsky, Syntactic Structures 1957).

Making the potentially infinite manageable

Potentially infinite also means potentially finite. The question is not simple, but from a practical point of view, regardless of the infinity of possible English sentences that have been written, will ever be written, or can potentially be written (ignoring the fact that languages can also die, as happened to Latin, ancient Greek or Sumerian, and since they can't be infinite any longer from the moment they die), we can frame the problem for a simpler angle. Since the number of both grammatically and semantically correct sentences (whether infinite or not) would be too large for us to handle with the technologies we have, we have to choose a finite set of sentences.

What interests us, however, is that this finite set is as representative of the English language as possible. This means that for every (again, potentially infinite) phrase search that a user makes, there will always be meaningful and useful examples to help the user resolve the linguistic doubt that prompted them to search for a particular phrase.

From a practical point of view, within our database, our goal is to collect a “balanced” representation of the current English language to help people with their daily writing tasks. We don’t need the entire corpus of all the possible writable sentences, just the ones that will be helpful for our user.

Ok, we understand that we are dealing with very nerdy stuff right now, but we can clarify your ideas by means of a metaphor.

Let's try to figure out Ludwig's database as a room full of lasers (sentences out of our simile) meant to cover all the space of the room (a representation of the English language) in the most effective way. So:

We want to have as many lasers as possible in the room
We also want your lasers to be as spatially discrete as possible (indeed, if all the lasers were focused on a single point, you would not have a good coverage of the room!)
We also want to have them focus where our users need them the most: people tend to walk on the floor, not to float on the roof of the room, i.e. if you have a limitation on how many lasers you can spread in the room, it's more effective to put one more at human height and one less on the ceiling.

Given this starting point, we have tried to construct the best representation of the English language from a practical point of view. We decided to provide the largest number of examples, by maximizing discreteness between sentences. Also, from a computational point of view, we needed the system to provide you with an answer as fast as possible.

As a consequence, we make extensive use of newspapers and media sources from high-quality generalistic media, such as the New York Times, the BBC, the Guardian, the New Yorker, etc.. These sources form the foundation of our knowledge base because they cover so many topics and are therefore very discrete.

Write better English sentences with Luwig!

We do have a lot of scientific writing examples too. Indeed, we had to be very picky about the language quality and we only used peer-reviewed journals that satisfy certain quality requisites (e.g. top-notch publishers, impact factor above a fixed threshold, etc.). As for the encyclopedias, we use only the voices that have the Excellent badge on Wikipedia and the Stanford Encyclopedia of Philosophy. We also make use of Wikihow to have some colloquial English and simpler sentences.

On the other hand, even though we have a database of 200M sentences we are very far from covering the whole English language and we have many language niches (e.g. older English, dialects, poetry, technical jargon) that are "less represented". Pragmatically, it's simply less likely for a specific sentence belonging to a very specific niche to be retrieved. In other words, it is less plausible that this type of laser is already covering the room of our simile.

Moreover, from a technical point of view, if we were to increase the database unconditionally, not only would the retrieval process become slower and computationally heavier, but it would also decrease the variety of the examples presented (discreteness). Impacting both performances and costs.

Why we decided to create Ludwig Library and how it works

We are perfectly aware that besides uniform average needs — strictly connected to the average use of the English language — our users also have very specific needs.

For example, I have a PhD in Medieval History, and for years I have written articles on medieval archeology in Sicily. This is indeed a very specific field and inevitably, our generic database was not covering this special scientific niche and hence my specific writing needs.

When I used to write scientific papers, I often found myself thinking about how to write a certain concept, what words I should have used. I'd end up looking for keywords in a PDF and wasting a lot of time, often searching my own for a certain turn of words that I remembered I had written but could not recall.

It was an inefficient process that wasted my time. To overcome this situation, instead of making the database grow enormously,** we thought that we could allow users to upload their own documents on Ludwig and themselves grow their personally customized version of Ludwig in the direction they deemed most useful for their writing purposes**.

Therefore alongside a full-rounded database good for all seasons, we have decided to give users the possibility to upload their own English sources.

I personally started uploading scientific articles in PDF (both the one I published and those written by other scholars, together with all the gray literature I produced, e.g. drafts of my application letters, CVs, grants and projects.
Library is a very powerful enabler in this regard. It allows you to customize Ludwig's results according to your specific writing needs.

It is important to stress that the sources that a user decides to upload are not shared: they are accessible only and exclusively by the specific user. In my case, for example, I uploaded scientific articles that have not yet been published yet and that I didn't want to be in the public domain. For a researcher, the secrecy of their own research is paramount until it’s made public and the intellectual property rights are established.

To make your Library more convenient to search on Ludwig, users can sort their documents in Collections that can be turned on and off at will from the Search Filter in the Search Bar.

Indeed what you have to do is to manage our search filters: you can turn off all of Ludwig's sources and use only your own. This way, you can consult the database according to your very specific needs, turning on and off from time to time to increase the efficiency of the Linguistic Search Engine.

In sum.. how can Ludwig Library help you write better English?

We have not solved the problem of the infinite number of languages, but thanks to our new function, Library, we are now much more efficient in providing you with a solution tailored to your very specific needs.

Ludwig gives you a balanced and fairly complete representation of the English language, Library adds that layer of customization by covering your specific linguistic niche.

If Ludwig lacks some specialist sources on things I don't write about (in my case Quantum Physics), I don't care. But thanks to Library, in the unlikely case I’ll start writing about Quantum Physics, I’ll be able to grow Ludwig to accommodate such a new writing need

Bibliography
Pullum, Geoffrey K. & Scholz, Barbara C. (2010). 6. Recursion and the infinitude claim. In Harry van der Hulst (ed.), Recursion and Human Language. De Gruyter Mouton. pp. 111-138.
https://citeseerx.ist.psu.edu/doc/10.1.1.168.5529

Hauser, Marc D., Noam Chomsky, and Warren Tecumseh Fitch. 2002. The faculty of language: What is it, who has it, and how did it evolve? Science 298:1569–1579.

Epstein, Sam and Norbert Hornstein. 2005. Letter on ‘The future of language’. Language 81:3–6.