i4i xml patent, #5787449
    
    Filed 1994, granted July 1998. The Canadian company i4i won a
      $200-million judgment against Microsoft. Many of the court papers are here.
    
    Here is what the patent claims:
      
    A system and method for the separate
      manipulation of the architecture and content of a document, particularly
      for data representation and transformations. The system, for use by
      computer software developers, removes dependency on document encoding
      technology. A map of metacodes found in the document is produced and
      provided and stored separately from the document. The map indicates the
      location and addresses of metacodes in the document. The system allows of
      multiple views of the same content, the
        ability to work solely on structure and solely on content,
      storage efficiency of multiple versions and efficiency of operation.
    
    Despite the italicized phrase, the patent application suggests that at
      the time of invention the i4i claim was all about a perceived improvement
      on the existing practice of mixing tags and content inline:
    
    For manual production of documents the
      intermingling of the markup codes with the content is still the best way
      of communicating structure. For electronic storage and manipulation it
      suffers from a number of shortcomings. 
    
    
    Yet further, there is a difficulty of
      resolving the markup codes from the structure. Markup codes have to be
      differentiated from the content stream they are a part of. This involves
      designating `special` characters or sequences of characters which should
      be identified and acted upon. This complicates the task of any routine
      which must work on the document.
    
    
    This is incredibly basic. It is fundamentally not an issue. The only claimed
    improvement for the i4i approach is processing speed:
    
    The present invention provides also for efficiency of operation on the
      document. The invention allows document operations to be much more
      efficient. It is no longer necessary to parse the entire document to
      locate the embedded codes. Differentiating codes from content is obviously
      no longer a problem since they are held in different areas. This also
      allows more efficient coding strategies to be developed without the
      restriction of ensuring that all codes are clearly differentiated from any
      possible content. 
    
    A further patent claim highlighting the concept of separation.
    
    Thus, in sharp contrast to the prior art the
      present invention is based on the practice of separating encoding
      conventions from the content of a document. The invention does not use
      embedded metacoding to differentiate the content of the document, but
      rather, the metacodes of the document are separated from the content and
      held in distinct storage in a structure called a metacode map, whereas
      document content is held in a mapped content area.
    
    
    It is not unreasonable to believe that somehow i4i's initial patent claims
    got progressively inflated by the time the case reached the court. They may
    have claimed their method covered any
    editing of xml in such a way as to offer editing of either the content or
    the structure separately. Alternatively, Office 12 did include a feature
    (the XML Data Store) in which some xml could be included in the files that
    would affect the format of the remaining document; this might have been the
    "data structure" that represented "distinct storage" of the "metacode map".
    Still, i4i's patent is actually about how to maintain the "metacodes" (XML
    markup tags) separately from the document content, for efficiency reasons;
    using one xml file to affect how another xml file determines a document
    layout is an entirely different thing.
    
    It is admittedly a logical chain from the idea of separating data and tags
    to the idea of separate editing of data and tags. But that chain does not
    appear to be in the patent. Furthermore, at one point, in discussing using
    metacode maps for multiple document views, i4i acknowledges prior art:
    
     In SGML this ability to overlay two or more
      structures on a single set of text is called Concur. Its usefulness has
      long been recognized but it has proven difficult to implement
    
    
    More generally there is this principle:
    
    Embedded codes are prior art
    
    There is a detailed example in the i4i patent application as to how to
    convert the following:
    
    <Chapter><Title>The Secret Life
      of Data</Title><Para>Data is
      hostile.</Para><Para>The End</Para></Chapter>
    
    Pointers need to be drawn from the metacode-map entries below to the
    corresponding points in the content on the righthand side. Each tag in the
    metacode map is followed by a list
    of pointers into the content.
    
    
      
        
          | metacode map 
 | content 
 | 
        
          | <Chapter>   [1] <Title>        [1]
 </Title>       [2]
 <Para>        [2] [3]
 </Para>       [3] [4]
 </Chapter> [4]
 
 | [1]The Secret Life of Data[2]Data is
 hostile.[3]The End[4]
 
 | 
      
    
    
    Block diagrams from the
      patent application are here.
    
    
    The Markman Hearing
    This is the court's decision on how the claims are to be interpreted. The
    full ruling
      is here; here are some quotes. Overall, they do not go very well for
    Microsoft.
    
     “metacode[s]” means “an individual
      instruction which controls the interpretation of the content of the data.”
    
    
    Note this is a broad interpretation, as opposed to "a data
    modifier that is stored in a separate structure".
    
     In total, the intrinsic record does not
      rebut the presumption that “mapped content” and “raw content” have
      different meanings. Further, the intrinsic record indicates “raw content”
      is a subset of “mapped content,” and “mapped content” does not need to be
      free of all metacodes.
    
     For the above-mentioned reasons, “mapped
      content” means “the content of a document corresponding to a metacode
      map.” “Metacode map” and “map of metacodes” mean “a data structure that
      contains a plurality of metacodes and their addresses of use corresponding
      to a mapped content.”
    
    Finally, note this claim:
    
    However, the disclosed algorithms create and
      store the metacode map and mapped content in “storage space” and do not
      require separate files for the metacode map and the mapped content.
    
    Wasn't separate storage the whole point of the original patent?
    
    
    I4I did indeed "invent" something. But what they invented was essentially
    the idea that some customers wanted to edit text documents that had an
    underlying XML structure. Once you realize that customers might pay for
    that, the creation of the actual product is obvious.
    This is very similar to the NTP v RIM case.
    The part about "separate manipulation of the architecture and content of
      a document" sounds deep, or at least nontrivial, except that the patent
      application itself strongly suggests that the invention is really just
      about a specific implementation
        technique for separating archictecture from content. 
      Virtually all flavors of xml use embedded tags, <foo>like
      this</foo>. The whole point of the i4i patent is that it doesn't
      use embedded tags.
    
    On the other hand, there are suggestions that Microsoft did in fact
      develop a format for creating "custom XML schemas" that used the i4i
      method. Any xml schema that lets you set the tag values in one place and
      one place only, as opposed to doing a global search-and-replace, could be
      said to violate the spirit of the i4i patent.
    
    Still, it is a stretch, to say the least, to believe that the i4i patent
      covers all custom XML schemas.
    
    The following is from the blog of an Office product manager at Microsoft,
      Brian Jones, http://blogs.msdn.com/brian_jones/archive/2005/11/04/integrating-with-business-data-store-custom-xml-in-the-office-xml-formats.aspx.
      
    XML Data Store
    In Office 12, we've introduced a new feature
      to the formats that we're currently calling the XML data store, and the
      way it works is really simple. As you should all know by now, the new
      format consists of a ZIP file with a bunch of XML parts (files) inside. Up
      until now we've talked about all the parts that we in Office have defined
      to create our documents. You as a developer also have the ability to add
      your own parts though. You can take any XML file and put it inside the ZIP
      package. Then all you need to do is create a relationship from the main
      document part to your XML part, and the Office applications will roundtrip
      your XML with the file, which means:
    Roundtripping your data: The ability to
      put your XML in the ZIP package means that you now have a place to store
      any data your solution may need. The data will travel with the document,
      but will always be stored as a separate XML part in the ZIP package. This
      means it's really easy to get to and modify without dealing with any of
      the application's data....
    Separating data from the document: As
      well, because the information is stored in the data store, you
        benefit from the fact that the user cannot directly edit your data by
        editing the document (they can’t accidentally delete part of your
      data, since it’s stored separately.
    
    This is kind of vague; a more concrete example can be found at http://msdn.microsoft.com/en-us/library/bb510135.aspx.
      (Another article on this feature is at http://openxmldeveloper.org/blog/b/openxmldeveloper/archive/2010/10/27/59361.aspx.)
      Note that it indeed allows a separate XML area that is connected to the
      main document only via tags. However, the original i4i patent appeared to
      involve using the separate area for tag values;
      the Microsoft strategy on the face of it is for a separate area for entire
      XML files. The last paragraph is all about the real-world importance of
      separating the tags and storing them elsewhere.
    
    See also Joe Wilcox's article at www.betanews.com/joewilcox/article/Is-Microsoft-violator-or-victim-in-i4i-patent-dispute/1250119565,
      in which he suggests that the Microsoft customers most interested in this
      new Office feature were those in the pharmaceuticals industry, which
        is exactly what i4i writes software for. 
    
    At the page www.afterdawn.com/news/article.cfm/2009/08/13/update_microsoft_knew_about_i4i_s_xml_patent,
      there is an alleged quote from Microsoft "newly leaked emails":
    
    "We saw [i4i's products] some time ago
        and met its creators. Word 11 will make it obsolete," said one email
      from Martin Sawicki, a member of Microsoft's XML for Word development
      team.
    
    That would make the '449 a defensive
        patent: one where the purpose is not
      to be a patent troll, but instead to allow you to launch defensive attacks
      against competitors that horn in on your market. This does not
      legitimize the patent completely, but does put it in a different context.
    An excellent technical blog on the '449 patent is at http://broadcast.oreilly.com/2009/08/mircrosoft-and-the-two-xml-pat.html.
      There's a good example of what metacodes are all about, but also a
      somewhat cryptic discussion of point tags (like <b> in html) versus
      range tags (like <title> ... </title>, strictly hierarchical).
      
    
    
    MS information on how the editing works: http://msdn.microsoft.com/en-us/library/aa212889%28office.11%29.aspx.
    
    It appears to be true that Microsoft intended
    to take i4i's broader idea -- supporting the structural editing of XML-based
    documents -- and thus to take over i4i's business niche.
    
    Somehow, i4i convinced a jury in East Texas that their patent covers any editing of XML, so as to preserve
    the structure. This is what Office 12 did.
    
    What of the jurors? Did they really think i4i's patent covered what
    Microsoft did, or did they think that Microsoft was trying to crush a
    competitor "unfairly"? Here are some quotes from the jurors, at http://thepriorart.typepad.com/the_prior_art/2010/01/jurors-from-i4i-v-microsoft.html:
    
    Juror BG: “I felt that i4i had a really
      strong case,” she says. “It was evident that Microsoft knew that [i4i] had
      a patent," and still decided “all of a sudden” to create its own version.
      
      Juror JS:  This juror noted that MS had met with i4i at one point:
      "[Microsoft] got their foot in the door and got enough information, and
      then took it.” JS also seemed concerned about Microsoft's lack of vigor in
      pursuing the case. "Two hundred million dollars seems to me like a great
      amount of money…I would think if I was Bill Gates, and had $200 million on
      the line, I would want to be present.”
      
      Juror BC: “It was very plain and very clear, throughout the testimony that
      what Microsoft said and did wasn't right”
    
    What did Microsoft do wrong?
    
    
    After the jury verdict, Microsoft petitioned the District Court for a
    "Judgment as a Matter of Law" (JMoL), meaning that they wanted the judge to
    declare that the jury verdict contradicted the existing law in the case;
    that is, to find "there is no legally sufficient
    evidentiary basis for a reasonable jury to find as the jury did." A high
    standard has to be met here, but this is indeed the appropriate avenue if
    the jury misunderstood the patent. However, the judge also
    misunderstands the patent; he wrote (in http://pld.cs.luc.edu/ethics/i4i_v_microsoft_district_jmol.pdf)
    
    The
      ‘449 patented invention created a reliable method of processing and
      storing content and metacodes separately and distinctly. The data
      structure primarily responsible for this separation is called a “metacode
      map.” According to the patent, the
        “metacode map” allows a computer to manipulate the structure of a
        document without reference to the content. [p 2]
    
    
    The metacode map is a data structure that once upon a time might have saved
    some computing resources, but which is trivial to work around by leaving the
    tags "in place" in the document. The metacode map has nothing to do with the
    idea of manipulating the XML structure without referring to the content,
    except in that it might suggest one
    possible way to do that.
    
    However, here's the district court opinion on data structures:
    
    First, Microsoft argues that i4i presented
      no evidence that the accused WORD products created “a data structure” as
      required by the Court’s construction of the claim term “metacode map.” The
      Court construed and instructed the jury that “metacode map” and “map of
      metacodes” in the ‘449 patent meant “a
        data structure that contains a plurality of metacodes and their
        addresses of use corresponding to mapped content.” The Court
      further construed “mapped content” as meaning “the content of a document
      corresponding to a metacode map.”
    
    
    Essentially, i4i managed to claim that any
    way of storing "metacodes", including
      embedding them in the body of the document, amounts to storing them
    in a "data structure" as covered by the patent. Even though the stated point
    of the patent was that this data structure be "separate".
    
    During trial Dr. Rhyne, one of i4i’s
      technical experts, explained that the meaning of “a data structure” was “a
      physical or logical relationship among data elements designed to support
      specific data manipulation functions.”
    
    
    In other words, embedded XML tags would now be a "data structure" too. 
    
    All this suggests that i4i has figured out how to expand their original
    claims. The expanded claim is clearly still tied to the invention, and so
    the court elected to uphold it, but the expansion so waters down the
    original idea as to turn it into something genuinely obvious.
    
    Maybe Microsoft's core problem is that they were not able to find a short
    and comprehensible way to say the following:
    
        embedded codes are
      prior art.
    
    
    Microsoft appealed the case to the Federal Circuit, and then to the Supreme
    Court. But you cannot appeal a finding of fact as to claims interpretation
    [pld: some issues of claim construction can be appealed; I'm still
    working on why Microsoft did not prevail here.]
    
    The issue MS brought to the Supreme Court was the fairness of the
    presumption that patents were valid, which thus required "clear and
    convincing evidence" to overturn a patent. The Supreme Court upheld this
    standard, though they did agree that in the case of prior art that had not been previously considered by the
    patent office then a weaker "preponderance of evidence" standard could
    apply. But that didn't help Microsoft, which probably wanted a new trial in
    order to give their legal arguments a second hearing.
    
    Discussion:
    
      - Is i4i a "patent troll"?
- How does this case affect the rest of us?
- What did i4i really invent?