Organizational Research By

Surprising Reserch Topic

storing large xml in mongodb

storing large xml in mongodb  using -'xml,mongodb'

I have a pretty huge xml (>10mb in size & 40+ elements). Currently we store such xml in Oracle db and use xquery to query and retrieve parts of the xml. This process is slow and takes many db calls. We are exploring mongodb to store this xml and query it.
I justed converted the xml to json and loaded into a mongo collection and it stored the huge json data in a flash. And it stores the xml nodes as nested docs. But when I query (using find) for a inner most element, it always returns the whole doc, containing nodes with non-matching element values also. I expect only few nodes that matches the given node value.
Let me know if there is any best way to store such large xml files in mongo db. And also let me know how to retrieve the inner nodes having exact values specified in the query. Thanks in advance.

asked Sep 28, 2015 by sachin wagh
0 votes

Related Hot Questions

4 Answers

0 votes

Have you thought about trying an up-to-date XML Database, such as BaseX ( It might give you much better results, in particular if you have used XQuery before anyway.

answered Sep 28, 2015 by ashish singh
0 votes

I had the same problem. In my case the top-level node in each XML file always contained a huge list of smaller nodes, so I ended up storing those items instead. To do it, I wrote my own xml-to-json command line tool. I've used it to convert 10GB of XML data into JSON, in a format that mongoimport can eat.

answered Sep 28, 2015 by devkumargupta
0 votes

There are several facts you should keep in mind:

Number 1- MongoDB returns only the whole document depending on whether it hit or not, there is no feature to return only a part of it (10 October, 2011) and if you need filtering you have to implement it with you own code.

Number 2- pay attention to elemmatch keyword. It indicates to search for some hits only in the same subdocument but not htourghout the whole document, so you might be confused here.

Number 3 - there is not right strategy of dividing your aggregate into collection in mongo comparing to RDBMS-s. So different data representation might solve your case.

Number 4 - despite of number 3 remark about the "no right way", there is a general recommendation to keep your documents less than 10 MB size

answered Sep 28, 2015 by kotmus2002
0 votes

This is the behavior of filtering multi level embedded document, normally the matching filter would return the whole document, not the subsets.

Check out my answers for mongodb-querying-array-elements-within-a-document and how-to-find-the-matched-record-in-mongodb for more info

May be you can add the sample xml schema currently you have, someone will help you structure the app.

answered Sep 28, 2015 by r3tt