i4i xml patent, #5787449
Filed 1994, granted July 1998. The Canadian company i4i won a
$200-million judgement against Microsoft. Many of the court papers are here.
Here is what the patent
claims:
A system and method for the separate
manipulation of the architecture and
content of a document, particularly for data representation and
transformations. The system, for use by computer software
developers,
removes dependency on document encoding technology. A map of
metacodes
found in the document is produced and provided and stored
separately from
the document. The map indicates the location and addresses of
metacodes in
the document. The system allows of multiple views of the same
content, the
ability to work solely on structure and solely on content, storage
efficiency of multiple versions and efficiency of operation.
Despite the italicized phrase, the patent application suggests that at the time of
invention the i4i claim was all about a perceived improvement on the
existing practice of mixing tags and content inline:
For manual production of documents the intermingling of the markup codes with the content is still the
best way of communicating structure. For electronic storage and manipulation it suffers from a number of shortcomings.
Yet further, there is a difficulty of resolving the markup codes from
the structure. Markup codes have to be differentiated from the content
stream they are a part of. This involves designating `special`
characters or sequences of characters
which should be identified and acted upon. This complicates the task of
any routine which must work on the document.
This is incredibly basic. It is fundamentally not an issue. The only
claimed improvement for the i4i approach is processing speed:
The present invention provides also for efficiency of operation on the
document. The invention allows document operations to be much more
efficient. It is no longer necessary to parse the entire document to
locate the embedded codes.
Differentiating codes from content is obviously no longer a problem
since they are held in different areas. This also allows more efficient
coding strategies to be developed without the restriction of ensuring
that all codes are clearly differentiated
from any possible content.
A further patent claim highlighting the concept of separation.
Thus, in sharp contrast to the prior art the present invention is based
on the practice of separating encoding conventions from the content of a
document. The invention does not use embedded metacoding to
differentiate the content of the
document, but rather, the metacodes of the document are separated from
the content and held in distinct storage in a structure called a
metacode map, whereas document content is held in a mapped content area.
It is not unreasonable to believe that somehow i4i's initial patent
claims got progressively inflated by the time the case reached the
court. They may have claimed their method covered any
editing
of xml in such a way as to offer editing of either the content or the
structure separately. Alternatively, Office 12 did include a feature
(the XML Data Store) in which some xml could be included in the files
that would affect the format of the remaining document; this might have
been the "data structure" that represented "distinct storage" of the
"metacode map". Still, i4i's patent is actually about how to maintain
the
"metacodes" (XML markup tags) separately from the document content, for
efficiency reasons; using one xml file to affect how another xml file
determines a document layout is an entirely different thing.
It is admittedly a logical chain from the idea of separating data
and
tags to the idea of separate editing of data and tags. But that chain
does not appear to be in the patent. Furthermore, at one point, in
discussing using metacode maps for multiple document views, i4i
acknowledges prior art:
In SGML this ability to overlay two or more structures on a single set of text is called Concur. Its usefulness
has long been recognized but it has proven difficult to implement
There is a detailed example in the i4i patent application as to how to convert the following:
<Chapter><Title>The Secret Life of
Data</Title><Para>Data is hostile. </Para>The
End</Chapter>
Block diagrams from the patent application are here.
The Markman Hearing
This is the court's decision on how the claims are to be interpreted. The full ruling is here; here are some quotes. Overall, they do not go very well for Microsoft.
“metacode[s]” means “an individual instruction which controls the interpretation of the content of the data.”
In total, the intrinsic record does not rebut the presumption that
“mapped content” and “raw content” have different meanings. Further,
the intrinsic record indicates “raw content” is a subset of “mapped
content,” and “mapped content” does not need to be free of all
metacodes.
For the abovementioned reasons, “mapped content” means “the content of
a document corresponding to a metacode map.” “Metacode map” and “map of
metacodes” mean “a data structure that contains a plurality of
metacodes and their addresses of use corresponding to a mapped content.”
However, the disclosed algorithms create and store the metacode map and
mapped content in “storage space” and do not require separate files for
the metacode map and the mapped content.
I4I did indeed "invent" something. But what they invented was
essentially the idea that some customers wanted to edit text documents
that had an underlying XML structure. Once you realize that customers
might pay for that, the creation of the actual product is obvious. This is very similar to the NTP v RIM case.
The part about "separate manipulation of the architecture and
content of a document" sounds deep, or at least nontrivial, except that
the patent application itself strongly suggests that the invention is
really just about a specific implementation technique for separating archictecture from content. Virtually all
flavors of xml use embedded tags, <foo>like this</foo>. The
whole point of the i4i patent is that it doesn't use embedded tags.
On the other hand, there are suggestions that Microsoft did in fact develop
a format for creating "custom XML schemas" that used the i4i method.
Any xml schema that lets you set the tag values in one place and one
place only, as opposed to doing a global search-and-replace, could be
said to violate the spirit of the i4i patent.
Still, it is a stretch, to say the least, to believe that the i4i patent covers all custom XML schemas.
The following is from the blog of an Office product manager at
Microsoft, Brian Jones, http://blogs.msdn.com/brian_jones/archive/2005/11/04/integrating-with-business-data-store-custom-xml-in-the-office-xml-formats.aspx.
XML Data Store
In Office 12, we've introduced a new
feature to the formats that
we're currently calling the XML data store, and the way it works is
really simple. As you should all know by now, the new format consists of
a ZIP file with a bunch of XML parts (files) inside. Up until now we've
talked about all the parts that we in Office have defined to create our
documents. You as a developer also have the ability to add your own
parts though. You can take any XML file and put it inside the ZIP
package. Then all you need to do is create a relationship from the main
document part to your XML part, and the Office applications will
roundtrip your XML with the file, which means:
Roundtripping your data: The
ability to put your XML in the
ZIP package means that you now have a place to store any data your
solution may need. The data will travel with the document, but will
always be stored as a separate XML part in the ZIP package. This means
it's really easy to get to and modify without dealing with any of the
application's data....
Separating data from the document:
As well, because the
information is stored in the data store, you benefit from the fact that
the user cannot directly edit your data by editing the document
(they
can’t accidentally delete part of your data, since it’s stored
separately.
This is kind of vague; a more concrete example can be found at http://msdn.microsoft.com/en-us/library/bb510135.aspx. (Another article on this feature is at http://openxmldeveloper.org/blog/b/openxmldeveloper/archive/2010/10/27/59361.aspx.) Note that it indeed allows a separate
XML area that is connected to the main document only via tags. However,
the original i4i patent appeared to involve using the separate area for
tag values; the Microsoft strategy on the face of it is for a separate area for entire XML files. The
last paragraph is all about the real-world importance of separating the
tags and storing them elsewhere.
See also Joe Wilcox's article at www.betanews.com/joewilcox/article/Is-Microsoft-violator-or-victim-in-i4i-patent-dispute/1250119565,
in which he suggests that the Microsoft customers most interested in
this new Office feature were those in the pharmaceuticals industry, which is exactly what i4i writes software
for.
At the page www.afterdawn.com/news/article.cfm/2009/08/13/update_microsoft_knew_about_i4i_s_xml_patent,
there is an alleged quote from Microsoft "newly leaked emails":
"We saw [i4i's products] some time
ago and met its creators. Word 11
will make it obsolete," said one email from Martin Sawicki, a member
of Microsoft's XML for Word development team.
That would make the '449 a defensive
patent: one where the purpose is not
to be a patent troll, but instead to allow you to launch defensive
attacks against competitors that horn in on your market. This does not legitimize the patent
completely, but does put it in a different context.
An excellent technical blog on the '449 patent is at http://broadcast.oreilly.com/2009/08/mircrosoft-and-the-two-xml-pat.html.
There's a good example of what metacodes are all about, but also a
somewhat cryptic discussion of point tags (like <b> in html)
versus range tags (like <title> ... </title>, strictly
hierarchical).
MS information on how the editing works: http://msdn.microsoft.com/en-us/library/aa212889%28office.11%29.aspx.
It appears to be true that Microsoft intended
to take i4i's broader idea -- supporting the structural editing of
XML-based documents -- and thus to take over i4i's business niche.
Somehow, i4i convinced a jury in East Texas that their patent covers any editing of XML, so as to preserve the structure. This is what Office 12 did.
What of the jurors? Did they really think i4i's patent covered what
Microsoft did, or did they think that Microsoft was trying to crush a
competitor "unfairly"? Here are some quotes from the jurors, at http://thepriorart.typepad.com/the_prior_art/2010/01/jurors-from-i4i-v-microsoft.html:
Juror BG: “I felt that i4i had a really strong case,” she
says. “It was evident that Microsoft knew that [i4i] had a patent," and
still decided “all of a sudden” to create its own version.
Juror JS: This juror noted that MS had met with i4i at one point:
"[Microsoft] got their foot
in the door and got enough information, and then took it.” JS also
seemed concerned about Microsoft's lack of vigor in pursuing the case.
"Two
hundred million dollars seems to me like a great amount of money…I
would think if I was Bill Gates, and had $200 million on the line, I
would want to be present.”
Juror BC: “It was very
plain and very clear, throughout the testimony that what Microsoft said
and did wasn't right”
What did Microsoft do wrong?
After the jury verdict, Microsoft petitioned the District Court for a
"Judgement as a Matter of Law" (JMoL), meaning that they wanted the
judge to declare that the jury verdict contradicted the existing law in
the case; that is, to find "there is no legally sufficient
evidentiary basis for a reasonable jury to find as the jury did." A high standard has to be met here, but this is indeed the
appropriate avenue if the jury misunderstood the patent. However, the
judge also misunderstands the patent; he wrote (in http://cs.luc.edu/pld/ethics/i4i_v_microsoft_district_jmol.pdf)
The
‘449 patented invention created a reliable method of processing and
storing content and metacodes separately and distinctly. The data
structure primarily responsible for this separation is called a
“metacode map.” According to the patent, the “metacode map” allows a computer to manipulate the structure of a document without reference to the content. [p 2]
The metacode map is a data structure that once upon a time might have
saved some computing resources, but which is trivial to work around by
leaving the tags "in place" in the document. The metacode map has
nothing to do with the idea of manipulating the XML structure without
referring to the content, except in that it might suggest one possible way to do that.
However, here's the district court opinion on data structures:
First, Microsoft argues that i4i
presented no evidence that the accused WORD products created “a data
structure” as required by the Court’s construction of the claim term
“metacode map.” The Court construed and instructed the jury that
“metacode map” and “map of metacodes” in the ‘449 patent meant “a data structure that contains a plurality of metacodes and their addresses of use corresponding to mapped content.” The Court further construed “mapped content” as meaning “the content of a document corresponding to a metacode map.”
Essentially, i4i managed to claim that any way of storing "metacodes", including embedding them in the body of the document,
amounts to storing them in a "data structure" as covered by the patent.
Even though the stated point of the patent was that this data structure
be "separate".
During trial Dr. Rhyne, one of i4i’s
technical experts, explained that the meaning of “a data structure” was
“a physical or logical relationship among data elements designed to
support specific data manipulation functions.”
In other words, embedded XML tags would now be a "data structure" too.
All this suggests that i4i has figured out how to
expand their original claims. The expanded claim is clearly still tied
to the invention, and so the court elected to uphold it, but the
expansion so waters down the original idea as to turn it into something
genuinely obvious.
Maybe Microsoft's core problem is that they were not able to find a short and comprehensible way to say the following:
embedded codes are prior art.
Microsoft appealed the case to the Federal Circuit, and then to the
Supreme Court. But you cannot appeal a finding of fact as to claims
interpretation. The issue MS brought to the Supreme Court was the
fairness of the presumption that patents were valid, which thus
required "clear and convincing evidence" to overturn a patent. The
Supreme Court upheld this standard, though they did agree that in the
case of prior art that had not
been previously considered by the patent office then a weaker
"preponderance of evidence" standard could apply. But that didn't help
Microsoft.
Discussion:
- Is i4i a "patent troll"?
- How does this case affect the rest of us?
- What did i4i really invent?