The Case for a Git-Powered Project Gutenberg…

Posted on February 27, 2012 by Mahmoud Al-Qudsi

Project Gutenberg, for those of you that are not already familiar with it, is one of the single-most important community projects of the century: an attempt at creating a digital library of free books in a variety of formats, preserving classics and other works of literature from all ages. At the time of this post, the project boasts an impressive 38,000 works for which the copyrights have expired and have been released into the public domain.

Project Gutenberg (PG from here on out) not only indexes the text of these titles, but also original illustrations, metadata (author(s), publisher(s), date(s), illustration(s), etc., and most importantly, bookmarks/tables of contents). The process of “creating” a book comprises many steps and starts off with scanning the original books, using OCR to convert the scanned images to text, manually reviewing the scanned contents for OCR conversion errors, fixing formatting (footnotes, endnotes, spacing, etc.), marking bookmarks and jump locations, creating tables of contents, and finally, to use a software terminology, “building” the files into many different formats to cover the very much fragmented spectrum of eBook file types.

The reason for this primer on how PG works is to give a sense of how complex the entire endeavor is and all the steps and components involved in the process. There probably are more steps were not more immediately apparent and most of the steps listed above can probably be broken up into several more steps each. The point is, it’s an incredibly complicated and error prone process. And even when it’s done without errors or mistakes, there’s always room for improvement. And this is where the need for version control comes in.

Continue reading →

Family Misunderstands Open Source, Panics, & Sues the Wrong Person…

Posted on September 22, 2007 by NeoSmart Technologies

Open source is supposed to be a way of simplifying licensing issues and sharing your software/music/video/other content with the masses — freely and magnanimously. Problem is, what happens when something open source is found to be a (possible) violation of some else’s rights? What happens to its derivatives? Do they just pack up shop and find something else, or are they legally responsible for their actions? In what seems poised to become a landmark case on this issue, we’re about to find out.

A Texan family is now suing Virgin Mobile for using a photo of their daughter, Alison Chang, in an ad campaign – the catch is, it was released by the photographer on Flickr under the Creative Commons Attribution license, and that’s where Virgin Mobile got the photo from. The problem is, the girl featured in the photo had no idea her photo was being used – or that it was released under the Creative Commons license.

As the case currently stands, the Changs are suing consumers of open source works and not the original party responsible for the release of the work as an open source material without a proper media consent form.

Continue reading →

Why Microsoft Won’t ID Patent Violations…

Posted on May 14, 2007 by NeoSmart Technologies

Earlier today, Microsoft announced it will begin actively seeking reparations for patent infringement by Linux and the Open Source Community in general. Larry Augustin posted his thoughts on the matter, expressing his opinion that it’s fear of having these IP-infringement claims debunked or challenged that’s keeping Microsoft from publishing these 235 alleged infringements to the public – and instead waiting until the OS community comes to the bargaining table. But let’s be realistic, shall we?

If Microsoft Corporation doesn’t have the biggest and baddest team of ~~lawyers~~ law firms, who does? It’s probably safe to assume that more than half of these patent infringements really are just that. Put aside the legitimacy of software patents in the first place and just look at the facts as they stand. Open source software gets its code from millions of developers and no amount of auditing or quadruple-checking will ensure clean-code. Despite Microsoft’s claims of “openly and knowingly” engaging in patent-violations, that’s most probably not the case.

Continue reading →

The NeoSmart Files

Recovery software and more

Tag Archives: copyright

The Case for a Git-Powered Project Gutenberg…

Family Misunderstands Open Source, Panics, & Sues the Wrong Person…

Why Microsoft Won’t ID Patent Violations…