Advertisement

Processing Large XML Wikipedia Dumps that won't fit in RAM in Python without Spark

Processing Large XML Wikipedia Dumps that won't fit in RAM in Python without Spark The Python ElementTree object allows you to read any sized XML that you have time to process. Unlike a DOM the entire XML document does not need to be loaded. This video shows how the entire of Wikipedia can be processed without a large amount of RAM in Python.


My blog post for this video:



The code for this video can be found here:


python,large xml,big data,wikipedia,jeff heaton,ElementTree,XML,DOM,

Post a Comment

0 Comments