Data Load¶
In [17]:
from datasets import load_dataset
data = load_dataset('squad', split='train')
df = data.to_pandas()
df.drop_duplicates(subset='context', keep='first', inplace=True)
In [18]:
df
Out[18]:
id | title | context | question | answers | |
---|---|---|---|---|---|
0 | 5733be284776f41900661182 | University_of_Notre_Dame | Architecturally, the school has a Catholic cha... | To whom did the Virgin Mary allegedly appear i... | {'text': ['Saint Bernadette Soubirous'], 'answ... |
5 | 5733bf84d058e614000b61be | University_of_Notre_Dame | As at most other universities, Notre Dame's st... | When did the Scholastic Magazine of Notre dame... | {'text': ['September 1876'], 'answer_start': [... |
10 | 5733bed24776f41900661188 | University_of_Notre_Dame | The university is the major seat of the Congre... | Where is the headquarters of the Congregation ... | {'text': ['Rome'], 'answer_start': [119]} |
15 | 5733a6424776f41900660f51 | University_of_Notre_Dame | The College of Engineering was established in ... | How many BS level degrees are offered in the C... | {'text': ['eight'], 'answer_start': [487]} |
20 | 5733a70c4776f41900660f64 | University_of_Notre_Dame | All of Notre Dame's undergraduate students are... | What entity provides help with the management ... | {'text': ['Learning Resource Center'], 'answer... |
... | ... | ... | ... | ... | ... |
87574 | 5735d0026c16ec1900b92815 | Kathmandu | Institute of Medicine, the central college of ... | Of what university is the Institute of Medicin... | {'text': ['Tribhuwan'], 'answer_start': [46]} |
87579 | 5735d07d012e2f140011a087 | Kathmandu | Football and Cricket are the most popular spor... | Along with cricket, what sport is highly popul... | {'text': ['Football'], 'answer_start': [0]} |
87584 | 5735d0f46c16ec1900b92823 | Kathmandu | The total length of roads in Nepal is recorded... | As of 2004, how many kilometers of road existe... | {'text': ['17,182'], 'answer_start': [54]} |
87589 | 5735d1a86c16ec1900b92831 | Kathmandu | The main international airport serving Kathman... | What is Nepal's primary airport for internatio... | {'text': ['Tribhuvan International Airport'], ... |
87594 | 5735d259012e2f140011a09d | Kathmandu | Kathmandu Metropolitan City (KMC), in order to... | In what US state did Kathmandu first establish... | {'text': ['Oregon'], 'answer_start': [229]} |
18891 rows × 5 columns
In [19]:
df.iloc[0]['context']
Out[19]:
'Architecturally, the school has a Catholic character. Atop the Main Building\'s gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend "Venite Ad Me Omnes". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary.'
In [20]:
df.iloc[0]['question']
Out[20]:
'To whom did the Virgin Mary allegedly appear in 1858 in Lourdes France?'
In [21]:
df.iloc[0]['answers']
Out[21]:
{'text': array(['Saint Bernadette Soubirous'], dtype=object), 'answer_start': array([515], dtype=int32)}
In [24]:
df.shape
Out[24]:
(18891, 5)
In [25]:
df.head(2)
Out[25]:
id | title | context | question | answers | |
---|---|---|---|---|---|
0 | 5733be284776f41900661182 | University_of_Notre_Dame | Architecturally, the school has a Catholic cha... | To whom did the Virgin Mary allegedly appear i... | {'text': ['Saint Bernadette Soubirous'], 'answ... |
5 | 5733bf84d058e614000b61be | University_of_Notre_Dame | As at most other universities, Notre Dame's st... | When did the Scholastic Magazine of Notre dame... | {'text': ['September 1876'], 'answer_start': [... |
Embedding API¶
In [26]:
import os
from openai import OpenAI
MODEL = "text-embedding-ada-002"
In [27]:
api_key = os.getenv('OPENAI_API_KEY')
client = OpenAI()
In [28]:
from openai import OpenAI
# For text embeddings
res = client.embeddings.create(
input="I love openai",
model="text-embedding-ada-002"
)
In [29]:
res
Out[29]:
CreateEmbeddingResponse(data=[Embedding(embedding=[-0.010811165906488895, -0.015025688335299492, -0.019898518919944763, -0.019599905237555504, 0.012154926545917988, 0.003408605232834816, -0.003242331789806485, 0.01315256766974926, -0.0015125791542232037, -0.014102701097726822, 0.020712917670607567, 0.017957530915737152, 0.009772805497050285, -0.03339041769504547, -0.009772805497050285, 0.012915033847093582, 0.027798201888799667, 0.014347021467983723, 0.020427878946065903, -0.009012698195874691, -0.00526645639911294, 0.005996023304760456, -0.025422867387533188, -0.02782534994184971, 0.006932584103196859, -0.0004040189669467509, 0.008530844934284687, -0.028829775750637054, -0.019884943962097168, -0.03368903324007988, 0.03982418403029442, 0.009548845700919628, -0.009650645777583122, -0.004475809168070555, -0.020414305850863457, -0.002981045050546527, -0.0029267517384141684, -0.0040312823839485645, 0.024418441578745842, 0.006525383796542883, 0.006983484141528606, 0.0061046103946864605, -0.004998382646590471, -0.018771931529045105, -0.03203308582305908, -0.0007189629250206053, -0.013736221008002758, -0.0005162111483514309, -0.005863683298230171, 0.019857797771692276, -0.015066408552229404, 0.02416054718196392, 0.001823917729780078, -0.005398796405643225, 0.008612285368144512, 0.008476551622152328, -0.03463916480541229, 0.008904111571609974, 0.015324302017688751, -0.003732668701559305, 0.013491900637745857, -0.004764242563396692, -0.011428752914071083, 0.0280153751373291, -0.0016669759061187506, 0.014577768743038177, 0.009779592044651508, 0.022463878616690636, 0.02498852089047432, -0.005500596482306719, 0.027567455545067787, -0.005263063125312328, -0.022328145802021027, 0.008598711341619492, 0.0014065374853089452, 0.0016169243026524782, -0.012480687350034714, -0.010817952454090118, -0.011170859448611736, -0.006477877032011747, 0.0037123088259249926, -0.019789930433034897, 0.025626467540860176, 0.014591341838240623, 0.0077368044294416904, 0.00013806633069179952, -0.00021367408044170588, -0.00027761724777519703, 0.0012071790406480432, -0.03173447027802467, 0.005028923042118549, 0.00689865043386817, 0.025667188689112663, 0.014686355367302895, -0.05413048341870308, 0.013851594179868698, -0.005191802978515625, 0.0027078816201537848, 0.004170408938080072, -0.02967132441699505, -0.0011214973637834191, -0.0017933776834979653, -0.019572757184505463, -0.01134052686393261, -0.015052835457026958, 0.011109779588878155, 0.016301583498716354, 0.0037733889184892178, 0.04183303564786911, 0.02169019915163517, 0.0037632088642567396, 0.02937271073460579, -0.016953103244304657, -0.04340754449367523, -0.04001421108841896, -0.013838021084666252, 0.02956273779273033, -0.005113756284117699, -0.031463004648685455, -0.00934524554759264, 0.0018340976675972342, 0.026087962090969086, 0.031191537156701088, 0.0010807772632688284, 0.007974337786436081, 0.011313379742205143, -0.011625566519796848, -0.013865168206393719, 0.0013853291748091578, -0.01622014306485653, 0.023454733192920685, 0.0215951856225729, 0.01905697025358677, -0.0036172952968627214, -0.028965510427951813, 0.022694626823067665, -0.014442034997045994, -0.003654622007161379, -0.008198297582566738, -0.006423583719879389, 0.023563319817185402, 0.02574862726032734, -0.008612285368144512, -0.010784019716084003, -0.01947774365544319, 0.008938045240938663, 0.01437416858971119, 0.004452055785804987, 0.00987460557371378, 0.005300389602780342, 0.014360594563186169, -0.014686355367302895, -0.028178256005048752, 0.018581904470920563, -0.0007736804545857012, -0.0074992710724473, -0.0049678427167236805, 0.020807931199669838, -0.006949550472199917, -0.015731502324342728, 0.012731794267892838, -0.008395111188292503, -0.03849399462342262, -0.023956947028636932, 0.018079690635204315, 0.028694042935967445, 0.03740812838077545, 0.012324593961238861, 0.0008436680072918534, -0.008327244780957699, 0.01415699440985918, 0.04074716940522194, -0.04419479891657829, 0.004082182422280312, -0.008849818259477615, 0.018649769946932793, 0.01693953014910221, 0.015894383192062378, -0.028232550248503685, 0.01878550462424755, 0.009392752312123775, 0.011075845919549465, 0.033634740859270096, 0.01935558393597603, 0.006284457165747881, -0.006976697128266096, 0.016871662810444832, -0.007010630797594786, -0.008944831788539886, 0.0013115240726619959, 0.002757085021585226, 0.0054361228831112385, 0.0012631691060960293, -0.004394369199872017, -0.6523890495300293, -0.0484568290412426, -0.003624082077294588, -0.01954561099410057, 0.032250259071588516, 0.024662761017680168, 0.010573633015155792, -0.012324593961238861, 0.00642697699368, 0.026725908741354942, -0.006430370267480612, -0.004187375772744417, -0.020740065723657608, 0.004628509283065796, 0.047913894057273865, -0.02847686968743801, 0.02413340099155903, -0.03675660863518715, -0.0072753108106553555, 0.011442326940596104, -0.024581320583820343, 0.010071419179439545, -0.001692425925284624, -0.0008669971721246839, 0.008815884590148926, 0.01698024943470955, 0.005524349864572287, 0.0028741550631821156, -0.005113756284117699, -0.012691074050962925, -0.01654590293765068, 0.019192704930901527, 0.004102542530745268, -0.019396305084228516, 0.05141581594944, -0.003196521895006299, -0.011333739385008812, 0.027119535952806473, 0.007675724569708109, 0.023753346875309944, -0.022721773013472557, -0.0035765753127634525, 0.0049678427167236805, -0.004489382728934288, -0.01598939672112465, 0.008524058386683464, 0.022531745955348015, -0.010546485893428326, -0.0015931709203869104, -0.015772221609950066, 0.01184952724725008, 0.008347604423761368, 0.010370032861828804, 0.003858222160488367, -0.006742557045072317, -0.00960992556065321, 0.03710951283574104, -0.025259988382458687, 0.008741231635212898, -0.006969910580664873, -0.002149677835404873, 0.018975531682372093, -0.040367115288972855, -0.014618488028645515, -0.025789348408579826, -0.02528713457286358, -0.0028402216266840696, 0.0004292569065000862, -0.022504599764943123, -0.04264743626117706, 0.030974363908171654, -0.004482595715671778, -0.0070852842181921005, 0.006617003586143255, -0.0025840247981250286, 0.031897351145744324, 0.023658333346247673, -0.010722939856350422, -0.01763176918029785, 0.018609050661325455, 0.015541475266218185, 0.004469022620469332, 0.004374009091407061, -0.014944248832762241, 0.033906206488609314, 0.0047846026718616486, -0.01338331401348114, -0.0014693142147734761, 0.010098565369844437, -0.021215131506323814, 0.020577184855937958, 0.033308979123830795, 0.004445269238203764, -0.025327853858470917, 0.02113369293510914, -0.012141353450715542, -0.009257018566131592, -0.00850369781255722, 0.005439516622573137, -0.028504015877842903, -0.004567429423332214, -0.013865168206393719, 0.019124837592244148, -0.009616712108254433, 0.014727074652910233, -0.0005497203092090786, -0.015731502324342728, 0.02122870460152626, 0.003254208480939269, -0.03325468674302101, -0.016695208847522736, -0.001327642472460866, -0.007207443937659264, -0.015270008705556393, 0.01622014306485653, -0.026345854625105858, -0.007981124334037304, 0.009175578132271767, 0.01892123743891716, -0.01892123743891716, 0.008971978910267353, 0.0017331460257992148, -0.012317807413637638, 0.0011019855737686157, 0.008035417646169662, 0.012663926929235458, -0.0004928819253109396, -0.012901460751891136, -0.01698024943470955, -0.0028571882285177708, 0.025558602064847946, -0.0027502982411533594, 0.006657723803073168, 0.0049678427167236805, -0.0037801754660904408, -0.025802921503782272, 0.021174412220716476, 0.007539990823715925, 0.018744783475995064, -0.012799660675227642, -0.014984969049692154, 0.0026145647279918194, 0.011611993424594402, 0.010756872594356537, -0.015812942758202553, -0.01723814383149147, 0.005520956590771675, 0.00927737820893526, 0.007607857696712017, -0.007906471379101276, 0.00013605153071694076, 0.019966384395956993, 0.005313963163644075, 0.017319582402706146, -0.0009136555017903447, -0.006443943828344345, -0.0164780355989933, 0.004991596098989248, -0.014672781340777874, 0.0016669759061187506, 0.020889371633529663, -0.017808223143219948, 0.0007796188001520932, -0.002804591553285718, -0.009508125483989716, 0.0009526789071969688, -0.01430630125105381, 0.0152428625151515, -0.02967132441699505, -0.03078433685004711, 0.007994698360562325, -0.031788766384124756, 0.005870469845831394, 0.021948091685771942, -0.021948091685771942, 0.012480687350034714, -0.031191537156701088, -0.026115108281373978, -0.004353648982942104, -0.013172927312552929, 0.017618196085095406, 0.016396595165133476, -0.007024203892797232, -0.0038344687782227993, 0.020115692168474197, 0.003936268854886293, 0.004146655555814505, 0.0362679660320282, -0.00203939457423985, 0.005042496137320995, -0.0034357518889009953, 0.029046950861811638, -0.011924180202186108, 0.027730336412787437, 0.014238434843719006, 0.0018612444400787354, 0.015840088948607445, -0.005375043023377657, 0.018514037132263184, 0.01951846480369568, 0.010492192581295967, 0.004217915702611208, 0.006786670535802841, -0.0077300178818404675, 0.02050931751728058, -0.031164390966296196, 0.0033373453188687563, -0.013899100944399834, 0.013335807248950005, 0.02293894626200199, -0.006002810318022966, 0.018025396391749382, -0.002570451470091939, -0.012120993807911873, 0.010675433091819286, 0.028694042935967445, -0.0023719414602965117, 0.0018714243778958917, -0.011035126633942127, 0.028368283063173294, -0.0007388987578451633, 0.004438482690602541, 0.000666790409013629, -0.014862808398902416, 0.013783727772533894, 0.005945123266428709, 0.011639139614999294, 0.007173510733991861, 0.013410461135208607, -0.0006354020442813635, -0.020590757951140404, -0.003203308442607522, -0.011808807030320168, 0.009107711724936962, 0.02369905449450016, -0.01799825020134449, 0.034992072731256485, 0.005022136028856039, 0.030811484903097153, -0.00919593870639801, 0.006348930299282074, -0.001321704126894474, 0.012697860598564148, -0.010376819409430027, 0.009413111954927444, 0.02284393273293972, 0.02697022818028927, -0.004116115625947714, -0.052664563059806824, -0.010112139396369457, -0.026915935799479485, -0.013084701262414455, -0.008490124717354774, 0.0072753108106553555, 0.0032457252964377403, -0.030241403728723526, -0.008103284984827042, 0.004611542448401451, 0.013817661441862583, 0.022165266796946526, 0.013573341071605682, 0.011225152760744095, 0.007336390670388937, 0.008069351315498352, -0.0012495956616476178, -0.011279446072876453, -0.01287431363016367, -0.0038650089409202337, -0.006891863886266947, -0.0018816044321283698, -0.0023380080237984657, -0.0065864636562764645, -0.002108957851305604, -0.025640040636062622, 0.026277989149093628, 0.015473608858883381, 0.012290660291910172, 0.002058057812973857, -0.0039803823456168175, 0.03387906029820442, 0.006196230184286833, -0.04248455911874771, 0.032548870891332626, 0.029155537486076355, 0.003186341840773821, -0.003647835459560156, -0.0392269566655159, -0.007438190747052431, -0.01273858081549406, 0.007329604122787714, 0.013091487810015678, 0.006314997095614672, -0.0034951353445649147, 0.0017390843713656068, -0.008035417646169662, 0.012249940074980259, 0.019586332142353058, -0.015595768578350544, 0.013159354217350483, -0.031001511961221695, 0.0033644919749349356, 0.0038548288866877556, -0.008205085061490536, -0.01061435230076313, 0.02939985692501068, -0.009861032478511333, 0.008693724870681763, -0.0030641816556453705, -0.010044272057712078, -0.04427623748779297, 0.013912674970924854, -0.01262320764362812, 0.001795074320398271, -0.011788446456193924, -0.0015753558836877346, -0.010661859065294266, -0.010770446620881557, -0.017753930762410164, 0.021730918437242508, 0.01196490041911602, -0.00949455238878727, 0.0038853688165545464, -0.013586914166808128, 0.0010629622265696526, 0.08583781123161316, 0.011788446456193924, 0.005439516622573137, 0.010356458835303783, -0.01389231439679861, 0.008673365227878094, -0.022531745955348015, -0.007302457466721535, -0.021174412220716476, 0.0024228414986282587, -0.028504015877842903, 0.00013594549091067165, 0.010573633015155792, 0.012338167056441307, 0.013050767593085766, -0.0004860952903982252, -0.0012411123607307673, -0.027404576539993286, -0.002602688269689679, -0.007356750778853893, 0.0010171522153541446, 0.01786251738667488, 0.0028300415724515915, 0.009779592044651508, 0.02175806649029255, 0.01053291279822588, 0.002190397819504142, 0.022273853421211243, -0.02815110981464386, -0.02969847060739994, 0.023522600531578064, 0.003203308442607522, 0.00030752099701203406, 0.010946899652481079, -0.01624728925526142, -0.013356167823076248, 0.01507998164743185, 0.015867235139012337, 0.00696312403306365, -0.014984969049692154, 0.006895257160067558, 0.01537859532982111, 0.008985552005469799, -0.0375710092484951, -0.00473030935972929, -0.029779911041259766, -0.023088254034519196, 0.015568622387945652, -0.024079106748104095, -0.03501921892166138, -0.007797884289175272, -0.007519631180912256, -0.028123963624238968, 0.00478799594566226, -0.007010630797594786, 0.04609506577253342, -0.04134439677000046, -0.004017708823084831, -0.0023380080237984657, -0.004404549021273851, -0.004350255709141493, -0.03586076572537422, -0.009888178668916225, -0.016043689101934433, -0.003817502176389098, -0.035290688276290894, 0.007044564001262188, -0.0046828025951981544, -0.014048407785594463, 0.023848360404372215, 0.004095755517482758, -0.014360594563186169, -0.040855757892131805, -0.01155091356486082, 0.019789930433034897, -0.0023685479536652565, 0.008958404883742332, -0.017876090481877327, 0.0075467778369784355, 0.02152731828391552, 0.0004241669084876776, -0.011014766059815884, -0.0025738447438925505, -0.004815142601728439, -0.01198526006191969, -0.0034442353062331676, -0.005727950017899275, -0.011591632850468159, 0.014279155060648918, 0.018744783475995064, 0.04215879738330841, 0.013200074434280396, 0.019029824063181877, 0.024214841425418854, 0.008483338169753551, 0.015487181954085827, 0.028694042935967445, 0.00764857791364193, -0.008232231251895428, -0.004377402365207672, 0.004947482608258724, -0.02007497102022171, -0.027241695672273636, 0.006589856930077076, 0.006885077338665724, 0.01415699440985918, 0.023495454341173172, 0.035453565418720245, -0.011829166673123837, -0.010553272441029549, 0.011102993041276932, 0.007512844167649746, 0.011747727170586586, -0.013675141148269176, 0.0029216615948826075, -0.0018714243778958917, -0.005490416660904884, 0.011198006570339203, -0.010227512568235397, -0.01026823278516531, -0.005368256475776434, -0.005996023304760456, 0.015120701864361763, 0.010845099575817585, 0.011130140163004398, 1.781500941433478e-05, 0.011428752914071083, -0.014170568436384201, -0.01377694122493267, 0.017129557207226753, -0.020156411454081535, 0.003949842415750027, -0.008130431175231934, -0.02762174978852272, 0.006298030260950327, -0.0027146681677550077, -0.012358526699244976, -0.017115982249379158, -0.01345118135213852, -0.01431987527757883, -0.046692293137311935, 0.012915033847093582, 0.006532170344144106, 0.00017698363808449358, -0.01674950309097767, -0.03292892500758171, -0.0017314492724835873, 0.020129265263676643, -0.0035494286566972733, 0.03472060710191727, -0.029101243242621422, 0.023427587002515793, -0.014102701097726822, -0.000769438745919615, -0.01635587587952614, -0.020115692168474197, -0.021730918437242508, -0.003698735497891903, 0.03417767211794853, 0.015731502324342728, 0.032413139939308167, 0.0013463058276101947, 0.0507642962038517, -0.007404257543385029, 0.0059417299926280975, -0.028395429253578186, 0.010132499039173126, 0.0010366638889536262, -0.014591341838240623, 0.008910898119211197, -0.0032287584617733955, -0.010790806263685226, 0.015337875112891197, -0.011781659908592701, -0.014754221774637699, 0.011530552990734577, -0.007974337786436081, -0.03442199155688286, 0.0012580790789797902, -0.02660374902188778, 0.007417831104248762, 0.013105060905218124, -0.009358818642795086, 0.01839187741279602, -0.0042280955240130424, 0.0018086476484313607, 0.026875214651226997, 0.01296932715922594, 0.029155537486076355, -0.008680151775479317, 0.017034543678164482, -0.003674982115626335, -0.034530580043792725, 0.018907664343714714, 0.005544709973037243, -0.023061105981469154, -0.00036075396928936243, -0.015419315546751022, -0.0007244770531542599, 0.014903528615832329, 0.031028658151626587, 0.016722356900572777, 0.010851886123418808, -0.0023210414219647646, -0.023712627589702606, 0.015174995176494122, -0.0005123935989104211, -0.0012054824037477374, 0.004380795639008284, -0.028232550248503685, -0.014089128002524376, -0.02956273779273033, -0.004777816124260426, -0.030377136543393135, -0.02616940252482891, -0.00891768466681242, -0.010281805880367756, 0.021663052961230278, -0.0077368044294416904, -0.013593700714409351, 0.015880808234214783, 0.034829191863536835, 0.03501921892166138, 0.003902335651218891, 0.013593700714409351, 0.018025396391749382, 0.009338458999991417, -0.019437024369835854, 0.009019484743475914, -0.01210063323378563, -0.002382121281698346, 0.009779592044651508, 0.027255268767476082, -0.017658917233347893, -0.0061113969422876835, 0.013369740918278694, 0.0003645714605227113, -0.029101243242621422, -0.025721481069922447, 0.008693724870681763, -0.012921820394694805, 0.013539407402276993, -0.026875214651226997, -0.008048991672694683, -0.0222467053681612, 0.019871370866894722, -0.01598939672112465, 0.0014285941142588854, -0.010729726403951645, 0.0027180614415556192, -0.011164072901010513, -0.011245513334870338, -0.0017967710737138987, 0.006365897133946419, 0.006298030260950327, 0.01445560809224844, -0.01252819411456585, -0.006664510350674391, 0.0013234007637947798, 0.004703162703663111, 0.005666869692504406, 0.01354619488120079, -0.007539990823715925, -0.00307775498367846, 0.011483046226203442, -0.004099148791283369, -0.005873863585293293, -0.021187985315918922, -0.0035697887651622295, 0.03501921892166138, -0.04031282290816307, -0.007831817492842674, -0.02238244004547596, -0.009480978362262249, 0.016097983345389366, -0.011564486660063267, -0.013593700714409351, 0.00020837198826484382, 0.012677500955760479, 0.013593700714409351, -0.0020326077938079834, -0.00829331111162901, -0.005096789449453354, 0.0020207311026751995, -0.016993822529911995, -0.022192412987351418, 0.0034391453955322504, -0.004930516239255667, 0.010261446237564087, -0.02920982986688614, -0.00598245020955801, 0.01571792922914028, -0.015107128769159317, -0.0014981575077399611, 0.049162641167640686, -0.014360594563186169, 0.005191802978515625, 0.04487346485257149, -0.03045857697725296, 0.004394369199872017, 0.010804379358887672, 0.014279155060648918, -0.019437024369835854, 0.016274435445666313, -0.0027367249131202698, -0.012304233387112617, 0.009243445470929146, 0.005052676424384117, 0.005894223228096962, -0.012731794267892838, -0.003817502176389098, 0.017197422683238983, 0.014442034997045994, 0.004893189296126366, 0.015392168425023556, -0.005337716545909643, 0.014537048526108265, -0.006990270689129829, -0.006450730375945568, 0.019654197618365288, 0.027051668614149094, 0.0018646377138793468, 0.019532037898898125, -0.012942180968821049, 0.0072617377154529095, -0.006277670152485371, 0.019559184089303017, 0.001555844210088253, -0.028042523190379143, -0.016328729689121246, 0.00957599189132452, -0.011483046226203442, 0.008347604423761368, -0.03333612531423569, -0.01056684646755457, -0.008734445087611675, -0.029345562681555748, -0.006342143751680851, 0.009501338936388493, -0.029752762988209724, 0.002387211425229907, 0.03143585845828056, -0.004923729691654444, 0.0009153521968983114, -0.006515203509479761, 0.007316031027585268, 0.009745659306645393, -0.01829686388373375, -0.011578059755265713, -0.00592476362362504, -0.01566363498568535, 0.025992948561906815, 0.021079398691654205, -0.028666896745562553, -0.039634156972169876, -0.0028402216266840696, -0.029481297358870506, 0.0011274355929344893, -0.02132371813058853, 0.030648604035377502, 0.021568039432168007, -0.0004525861004367471, 0.008768378756940365, 0.040068503469228745, 0.0017289043171331286, -0.011781659908592701, -0.018568331375718117, 0.0019240210531279445, -0.0010553272441029549, -0.015446461737155914, -0.0004780361196026206, -0.024581320583820343, -0.022884653881192207, -0.009134858846664429, 0.005571856629103422, 0.012406033463776112, -0.0058840434066951275, 0.0077368044294416904, 0.006376076955348253, 0.007791097741574049, -0.011544127017259598, -0.0029657750856131315, 0.013722647912800312, 0.01760462298989296, 0.005911190062761307, -0.03738098219037056, -0.025952227413654327, 0.0032525118440389633, -0.0025908115785568953, 0.013858381658792496, -0.00764857791364193, 0.01061435230076313, 0.017577476799488068, 0.005222342908382416, -0.012168500572443008, 0.014618488028645515, -0.017387449741363525, -0.00306587852537632, 0.020726492628455162, 0.002170037943869829, -0.004876222927123308, 0.01264356728643179, 0.019803505390882492, -0.005907796788960695, -0.014577768743038177, 0.003106598509475589, 0.020577184855937958, -0.009820312261581421, 0.02508353441953659, -0.015188568271696568, 0.015392168425023556, -0.003442538669332862, 0.029318416491150856, -0.02086222544312477, -0.003698735497891903, 0.007187084294855595, 0.005235916469246149, -0.02416054718196392, 0.003345828503370285, -0.0037021287716925144, -0.007234590593725443, 0.003037034999579191, -0.02063147909939289, 0.015365022234618664, 0.0015363325364887714, -0.021880226209759712, -0.0045063490979373455, -0.02828684262931347, 0.003539248602464795, 0.00425184890627861, -0.01809326373040676, -0.01776750385761261, 0.0050187427550554276, 0.0242284145206213, -0.019884943962097168, 0.0066373636946082115, 0.23693624138832092, -0.036539435386657715, -0.0037801754660904408, 0.019097691401839256, -0.014903528615832329, -0.008761591278016567, 0.0008330637938342988, -0.015093555673956871, -0.007017417345196009, 0.005259669851511717, 0.014265581965446472, 0.0017696243012323976, -0.0020376979373395443, -0.011883459985256195, 0.002753691514953971, -0.01799825020134449, -0.028721189126372337, -0.0007359295850619674, -0.010940113104879856, -0.0013013441348448396, 0.023509027436375618, -0.010729726403951645, -0.02570790797472, -0.008619071915745735, -0.004927122965455055, 0.02485278807580471, -0.025368575006723404, -0.007119217421859503, 0.002141194650903344, 0.014889955520629883, -0.009569205343723297, 0.0008258529705926776, 0.012412820011377335, -0.019817078486084938, 0.009338458999991417, -0.013627634383738041, -0.006810423918068409, -0.0054870229214429855, 0.030377136543393135, -0.0011418573558330536, 0.013614061288535595, 0.009922112338244915, 0.0016686726594343781, -0.018446169793605804, -0.006715410389006138, 0.021703772246837616, -0.006372683681547642, -0.006667903624475002, 0.0022141512017697096, 0.01848689094185829, -0.024676334112882614, -0.010207152925431728, 0.017577476799488068, 0.020794358104467392, -0.007696084212511778, -0.000778346264269203, -0.004811749327927828, -0.0007444129441864789, -0.006715410389006138, 0.010166432708501816, -0.008605497889220715, 0.016600195318460464, 0.0081575782969594, 0.014442034997045994, -0.03352615237236023, 0.008408685214817524, -0.023414013907313347, 0.002314254641532898, 0.0273638553917408, -0.010295378975570202, 0.0032898385543376207, -0.015188568271696568, 0.006450730375945568, -0.021649479866027832, -0.03572503477334976, -0.013247581198811531, 0.03463916480541229, 0.02815110981464386, 0.004350255709141493, -0.001122345682233572, 0.02136443927884102, -0.010241085663437843, 0.006474483758211136, 0.006983484141528606, -0.03303751349449158, -0.053614698350429535, 0.006325176917016506, -0.0029555950313806534, -0.001966437790542841, -0.015799369663000107, -0.009189152158796787, -0.016803795471787453, 0.006050316616892815, -0.019640624523162842, -0.003389941994100809, -0.00455046258866787, 0.027010949328541756, 0.010974045842885971, 0.012867527082562447, 0.004621722735464573, -0.008055778220295906, -0.012385673820972443, 0.023128973320126534, 0.007920044474303722, 0.00984067190438509, 0.019165556877851486, -0.01354619488120079, 0.014591341838240623, 0.012039553374052048, -0.007770737633109093, -0.011516979895532131, -0.02863975055515766, -0.008184724487364292, 0.022898226976394653, -0.009026272222399712, -0.006861323956400156, -0.013824447989463806, 0.0012232973240315914, 0.034259114414453506, -0.011130140163004398, -0.02053646557033062, -0.006969910580664873, -0.007234590593725443, -0.0001964953262358904, -0.012182073667645454, -0.006006203591823578, -0.032684605568647385, 0.0003518464509397745, -0.010845099575817585, -0.011863100342452526, 0.03542641922831535, -0.01645088940858841, -0.0012275390326976776, -0.011096206493675709, 0.010281805880367756, -0.005856896750628948, 0.007668937556445599, -0.008978765457868576, -0.002908088266849518, 0.01400768756866455, -0.008001484908163548, 0.028992656618356705, 0.005765276495367289, -0.025599321350455284, 0.01364799402654171, -0.005453089717775583, 0.010437899269163609, 0.01222958043217659, -0.01168664637953043, 0.01127265952527523, -0.014401314780116081, 0.0013183107366785407, -0.02570790797472, 0.006766310427337885, 0.05244738981127739, -0.0038887623231858015, -0.035290688276290894, 0.0036003286950290203, 0.0034357518889009953, 0.014414887875318527, -0.0454978384077549, 0.011326952837407589, 0.00998997874557972, 0.007953978143632412, -0.03089292347431183, -0.029807057231664658, -0.1750418096780777, 0.010587206110358238, 0.010865459218621254, -0.03941698372364044, 0.026563027873635292, 0.00013234007928986102, 0.027282414957880974, -0.009019484743475914, -0.000416107737692073, -0.0020122479181736708, 0.03062145784497261, -0.00200715777464211, -0.007044564001262188, -0.013050767593085766, 0.024241987615823746, -0.004241669084876776, -0.004621722735464573, 0.014754221774637699, 0.024282706901431084, 0.013593700714409351, 0.01878550462424755, -0.016763076186180115, 0.00033233477734029293, 0.00908056553453207, 0.02139158546924591, 0.008055778220295906, 0.0007991304737515748, 0.031761616468429565, 0.00517483614385128, -0.007750377990305424, -0.019654197618365288, -0.03124583140015602, 0.05204018950462341, 0.016192996874451637, 0.03415052592754364, 0.00761464424431324, 0.0212558526545763, -0.01578579656779766, -0.001071445643901825, 0.007757164537906647, -0.004482595715671778, 0.03121868520975113, 0.013057554140686989, -0.021608758717775345, -0.018351156264543533, -0.002986134961247444, 0.030865777283906937, -0.0039057289250195026, 0.007594284135848284, -0.011415179818868637, 0.01445560809224844, -0.0074992710724473, 0.010838313028216362, 0.0037224888801574707, 0.033770471811294556, 0.014360594563186169, -0.0014201108133420348, 0.03686519339680672, -0.0037021287716925144, -0.002735028276219964, 0.00263322819955647, -0.0033220751211047173, 0.013016833923757076, -0.012073487043380737, 0.0012453540693968534, 0.0010417539160698652, -0.01859547756612301, 0.0031829485669732094, -0.02120155841112137, 0.022341718897223473, -0.01691238209605217, 0.011055486276745796, 0.017848944291472435, 0.00374963553622365, 0.014279155060648918, 0.0278524961322546, -0.028205402195453644, 0.02251817286014557, 0.02676662802696228, -0.036077938973903656, -0.013715861365199089, 0.0022650512401014566, -0.01740102283656597, 0.020427878946065903, -0.007811457850039005, 0.008205085061490536, 0.013159354217350483, -0.008809098042547703, -0.02986134961247444, -0.017753930762410164, 0.01172058004885912, 0.0025840247981250286, -0.004771029576659203, -0.0032236685510724783, -0.016722356900572777, 0.026576600968837738, 0.03013281710445881, 0.010492192581295967, -0.00031451976974494755, -0.017577476799488068, -0.017658917233347893, 0.005904403515160084, 0.0016559476498514414, 0.008340817876160145, 0.016030116006731987, 0.005816176533699036, 0.0032966253347694874, -0.001315765781328082, 0.019667770713567734, -0.02367190644145012, 0.0005204527988098562, -0.005493809934705496, 0.004804962780326605, -0.00030476393294520676, -0.01460491493344307, 0.016165848821401596, 0.0004890644340775907, -0.025327853858470917, 0.03765244781970978, 0.0029437183402478695, 0.03477489948272705, -0.007044564001262188, -0.010180005803704262, 0.030241403728723526, -0.007838604040443897, -0.003001405159011483, -0.09647931158542633, -0.026508735492825508, 0.016070835292339325, 0.03431340679526329, -0.02420126646757126, 0.03643084689974785, -0.001874817768111825, 0.04834824055433273, -0.033471859991550446, 0.01905697025358677, -0.00800827145576477, -0.029019802808761597, -0.021771639585494995, -0.005276636220514774, 0.011611993424594402, -0.0243912935256958, -0.01415699440985918, -0.011856313794851303, -0.02858545631170273, -0.0003845072933472693, 0.005500596482306719, 0.01908411830663681, 0.023576892912387848, -0.003143925219774246, -0.009786378592252731, 0.007193870842456818, -0.04001421108841896, 0.029508443549275398, 0.01026823278516531, 0.010851886123418808, 0.01878550462424755, -0.025232840329408646, 0.006521990522742271, -0.048999760299921036, -0.015432888641953468, 0.002212454564869404, -0.0036885554436594248, -0.022816786542534828, 0.004801569506525993, -0.02858545631170273, 0.0039057289250195026, -0.008727658540010452, 0.023359719663858414, -0.017658917233347893, -0.031625885516405106, 0.006060496903955936, 0.007322817575186491, 0.010410753078758717, 0.015310728922486305, -0.03952556848526001, -0.03501921892166138, -0.013552981428802013, -0.02494780160486698, 0.011869886890053749, 0.019654197618365288, -0.00401431554928422, 0.030567163601517677, 0.006620397325605154, -0.014822088181972504, 0.001978314481675625, -0.01437416858971119, 0.01605726219713688, -0.028341136872768402, 0.011204793117940426, -0.0013649690663442016, -0.007770737633109093, -0.011679859831929207, -0.01315256766974926, 0.003892155596986413, 0.0054870229214429855, 0.02175806649029255, 0.015025688335299492, -0.005677049979567528, 0.01855475641787052, -0.029752762988209724, -0.029427003115415573, -0.018568331375718117, -0.0468008816242218, 0.01562291570007801, -0.00020794782903976738, -0.010444685816764832, -0.013200074434280396, -0.017102409154176712, -0.01836473122239113, -0.0013802391476929188, -0.0032864452805370092, -0.012270300649106503, -0.014713501557707787, -0.017943957820534706, -0.00893125869333744, -0.003235545242205262, 0.024866361171007156, 0.010831526480615139, -0.03477489948272705, -0.013539407402276993, -0.00636929040774703, 0.0005938336835242808, 0.00823901779949665, 0.019898518919944763, -0.014387741684913635, -0.022626759484410286, -0.009562418796122074, -0.04631223902106285, 0.015351449139416218, -0.01170022040605545, 0.00392948230728507, 0.013980541378259659, -0.03458487242460251, 0.0012971024261787534, -0.01843259669840336, 0.003157498547807336, -0.011252299882471561, -0.004017708823084831, -0.0014574375236406922, -0.0033882453572005033, -0.0005323294899426401, -0.009976405650377274, -0.01661377027630806, 0.03371617943048477, 0.013166140764951706, -0.0005238461308181286, 0.003831075504422188, 0.02667161449790001, 0.027906788513064384, 0.016532329842448235, 0.01184952724725008, 0.002119137905538082, 0.014442034997045994, -0.01921985112130642, 0.015907956287264824, -0.001966437790542841, 0.0007656212546862662, -0.011578059755265713, -0.037761036306619644, -0.00785896461457014, 0.023644760251045227, -0.02014283835887909, -0.019369157031178474, -0.011795233003795147, 0.01790323667228222, 0.01077723316848278, 0.01645088940858841, -0.03472060710191727, -0.021798785775899887, 0.016817370429635048, -0.008401897735893726, -0.008442617952823639, -0.013580127619206905, -0.014944248832762241, -0.005931550171226263, 0.01076365914195776, 0.02063147909939289, 0.014197714626789093, 0.024513453245162964, -0.04259314388036728, -0.024174120277166367, -0.018894091248512268, -0.009236658923327923, 0.02337329275906086, 0.01184952724725008, -0.005076429806649685, 0.019464170560240746, 0.048674002289772034, -0.007770737633109093, 0.02185308001935482, -0.015840088948607445, 0.017726782709360123, 0.0026484981644898653, -0.015595768578350544, 0.016328729689121246, 0.0215951856225729, -0.009603139013051987, -0.027879642322659492, -0.021540893241763115, 0.00461493618786335, 0.0265358816832304, 0.01061435230076313, -0.00565329659730196, 0.005459876265376806, -0.00517483614385128, -0.015514329075813293, 0.02198881283402443, 0.020875798538327217, -0.0077300178818404675, -0.01809326373040676, -0.00998997874557972, 0.007702871225774288, 0.010824739933013916, -0.0015906259650364518, -0.007064924109727144, -0.005520956590771675, 0.01091296598315239, -0.024879934266209602, -0.005317356437444687, 0.017224570736289024, 0.03092007152736187, 0.0096438592299819, 0.009053418412804604, -0.009019484743475914, -0.009521698579192162, 0.020957238972187042, 0.021744491532444954, 0.012534980662167072, -0.0018001643475145102, -0.002041091211140156, -0.02228742651641369, -0.007146364077925682, 0.017876090481877327, -0.02983420342206955, -0.011279446072876453, 0.019396305084228516, 0.013966968283057213, 0.0326303131878376, 0.008883751928806305, 0.008754804730415344, 0.01717027649283409, -0.008442617952823639, 0.01901625096797943, 0.004750669468194246, -0.010770446620881557, -0.028422575443983078, 0.01566363498568535, 0.02231457270681858, 0.0021564646158367395, 0.02300681360065937, -0.024513453245162964, 0.01422486174851656, 0.017048116773366928, 0.01664091646671295, -0.00942668505012989, 0.020563611760735512, -0.006606823764741421, -0.010281805880367756, -0.021500172093510628, -0.024146974086761475, -0.012989687733352184, -0.017061689868569374, -0.00931131187826395, -0.02316969260573387, 0.021269425749778748, -0.0299970842897892, 0.05803960561752319, 0.008876965381205082, -0.01991209201514721, 0.00787932425737381, 0.001379390829242766, -0.014632062055170536, 0.01345118135213852, 0.007417831104248762, -0.01882622390985489, -0.0020427878480404615, -0.011734153144061565, -0.011992046609520912, -0.014591341838240623, -0.009155218489468098, -0.004760849289596081, -0.011218366213142872, 0.0023227380588650703, 0.013804088346660137, 0.01252819411456585, -0.020875798538327217, 0.01935558393597603, 0.028721189126372337, 0.006477877032011747, 0.018541183322668076, -0.03447628766298294, -0.012860740534961224, 0.01727886311709881, -0.014238434843719006, 0.013356167823076248, -0.02570790797472, 0.017455317080020905, -0.01415699440985918, -0.037462420761585236, -0.007641790900379419, -0.007214230950921774, 0.017088836058974266, -0.004472415894269943, -0.011903820559382439, -0.011408393271267414, 0.018147557973861694, 0.0014464091509580612, 0.016260862350463867, -0.017360303550958633, -0.027933936566114426, 0.015609342604875565, -0.0033186818473041058, -0.021187985315918922, 0.016274435445666313, -0.03059431165456772], index=0, object='embedding')], model='text-embedding-ada-002', object='list', usage=Usage(prompt_tokens=4, total_tokens=4))
In [32]:
def get_embedding(text, model="text-embedding-ada-002"):
text = text.replace("\n", " ")
return client.embeddings.create(input = [text], model=model).data[0].embedding
In [33]:
vec = get_embedding("I am trying a new text \n And see what happens")
In [34]:
len(vec) #
Out[34]:
1536
Vector DB Setup¶
In [46]:
# db of 1536 dimension
# API_KEY = "your key"
# ENV = "your env"
import pinecone
from pinecone import Pinecone, ServerlessSpec
pc = Pinecone(api_key="your key")
# pinecone.init(api_key = API_KEY, environment = ENV)
# pinecone.create_index("ai-agent", dimension=1536, metric='dotproduct')
index = pc.Index("squad-data")
this one is for create index in pinecone¶
In [47]:
# index_name = "squad-data"
# pc.create_index(
# name=index_name,
# dimension=1536,
# metric="cosine",
# spec=ServerlessSpec(
# cloud="aws",
# region="us-east-1"
# )
# )
In [48]:
# index.delete(delete_all=True)
Indexing¶
In [49]:
df_sample = df.sample(10000, random_state=45)
batch_size = 20 # free tier limit 20 RPM
In [50]:
# embedding function from OpenAI
from langchain.embeddings.openai import OpenAIEmbeddings
model_name = "text-embedding-ada-002"
embed = OpenAIEmbeddings(
model = model_name,
openai_api_key= api_key)
In [51]:
from tqdm.auto import tqdm
import time
In [52]:
%%time
for i in tqdm(range(0, len(df_sample), batch_size)):
i_end = min(i+batch_size, len(df_sample))
# print(i, i_end)
batch = df_sample.iloc[i:i_end]
meta_data = [{"titile" : row['title'],
"context": row['context']}
for i, row in batch.iterrows()]
# embedding
docs = batch['context'].tolist() # pd.Series to python list
# emb_vectors = [get_embedding(doc, MODEL) for doc in docs]
emb_vectors = embed.embed_documents(docs) # list of list
ids = batch['id'].tolist()
# upsert
to_upsert = zip(ids, emb_vectors, meta_data)
index.upsert(vectors=to_upsert)
time.sleep(20) # 8s for 50 data points
0%| | 0/500 [00:00<?, ?it/s]
CPU times: user 59 s, sys: 926 ms, total: 59.9 s Wall time: 3h 9min 36s
In [54]:
df.shape[0]/3600 # 5 hrs to load , free tier will take 15hrs
# 14000 records/dollar
Out[54]:
5.2475
Using¶
In [80]:
# Langchain vector store defination
from langchain.vectorstores import Pinecone
from langchain_pinecone import PineconeVectorStore
vectorstore = Pinecone(index=index, embedding=embed.embed_query, text_key="titile")
vector_store = PineconeVectorStore(index=index, embedding=get_embedding, text_key="context")
In [81]:
query = "Virgin Mary"
In [83]:
# pure semantic, non generative, non agent based
vectorstore.similarity_search(query, k=10)
Out[83]:
[Document(metadata={'context': "The Perpetual Virginity of Mary asserts Mary's real and perpetual virginity even in the act of giving birth to the Son of God made Man. The term Ever-Virgin (Greek ἀειπάρθενος) is applied in this case, stating that Mary remained a virgin for the remainder of her life, making Jesus her biological and only son, whose conception and birth are held to be miraculous. While the Orthodox Churches hold the position articulated in the Protoevangelium of James that Jesus' brothers and sisters are older children of Joseph the Betrothed, step-siblings from an earlier marriage that left him widowed, Roman Catholic teaching follows the Latin father Jerome in considering them Jesus' cousins."}, page_content='Mary_(mother_of_Jesus)'), Document(metadata={'context': 'The popularity of this particular representation of The Immaculate Conception spread across the rest of Europe, and has since remained the best known artistic depiction of the concept: in a heavenly realm, moments after her creation, the spirit of Mary (in the form of a young woman) looks up in awe at (or bows her head to) God. The moon is under her feet and a halo of twelve stars surround her head, possibly a reference to "a woman clothed with the sun" from Revelation 12:1-2. Additional imagery may include clouds, a golden light, and cherubs. In some paintings the cherubim are holding lilies and roses, flowers often associated with Mary.'}, page_content='Immaculate_Conception'), Document(metadata={'context': 'Mary resided in "her own house"[Lk.1:56] in Nazareth in Galilee, possibly with her parents, and during her betrothal — the first stage of a Jewish marriage — the angel Gabriel announced to her that she was to be the mother of the promised Messiah by conceiving him through the Holy Spirit, and she responded, "I am the handmaid of the Lord. Let it be done unto me according to your word." After a number of months, when Joseph was told of her conception in a dream by "an angel of the Lord", he planned to divorce her; but the angel told him to not hesitate to take her as his wife, which Joseph did, thereby formally completing the wedding rites.[Mt 1:18-25]'}, page_content='Mary_(mother_of_Jesus)'), Document(metadata={'context': 'From the early stages of Christianity, belief in the virginity of Mary and the virgin conception of Jesus, as stated in the gospels, holy and supernatural, was used by detractors, both political and religious, as a topic for discussions, debates and writings, specifically aimed to challenge the divinity of Jesus and thus Christians and Christianity alike. In the 2nd century, as part of the earliest anti-Christian polemics, Celsus suggested that Jesus was the illegitimate son of a Roman soldier named Panthera. The views of Celsus drew responses from Origen, the Church Father in Alexandria, Egypt, who considered it a fabricated story. How far Celsus sourced his view from Jewish sources remains a subject of discussion.'}, page_content='Mary_(mother_of_Jesus)'), Document(metadata={'context': 'Some Western writers claim that the immaculate conception of Mary is a teaching of Islam. Thus, commenting in 1734 on the passage in the Qur\'an, "I have called her Mary; and I commend her to thy protection, and also her issue, against Satan driven away with stones", George Sale stated: "It is not improbable that the pretended immaculate conception of the virgin Mary is intimated in this passage. For according to a tradition of Mohammed, every person that comes into the world, is touched at his birth by the devil, and therefore cries out, Mary and her son only excepted; between whom, and the evil spirit God placed a veil, so that his touch did not reach them. And for this reason they say, neither of them were guilty of any sin, like the rest of the children of Adam."'}, page_content='Immaculate_Conception'), Document(metadata={'context': 'The papal bull defining the dogma, Ineffabilis Deus, mentioned in particular the patrististic interpretation of Genesis 3:15 as referring to a woman, Mary, who would be eternally at enmity with the evil serpent and completely triumphing over him. It said the Fathers saw foreshadowings of Mary\'s "wondrous abundance of divine gifts and original innocence" "in that ark of Noah, which was built by divine command and escaped entirely safe and sound from the common shipwreck of the whole world; in the ladder which Jacob saw reaching from the earth to heaven, by whose rungs the angels of God ascended and descended, and on whose top the Lord himself leaned; in that bush which Moses saw in the holy place burning on all sides, which was not consumed or injured in any way but grew green and blossomed beautifully; in that impregnable tower before the enemy, from which hung a thousand bucklers and all the armor of the strong; in that garden enclosed on all sides, which cannot be violated or corrupted by any deceitful plots; in that resplendent city of God, which has its foundations on the holy mountains; in that most august temple of God, which, radiant with divine splendours, is full of the glory of God; and in very many other biblical types of this kind."'}, page_content='Immaculate_Conception'), Document(metadata={'context': '"So he carried me away in the spirit into the wilderness: and I saw a woman sit upon a scarlet coloured beast, full of names of blasphemy, having seven heads and ten horns. "And the woman was arrayed in purple and scarlet colour, and decked with gold and precious stones and pearls, having a golden cup in her hand full of abominations and filthiness of her fornication: "And upon her forehead was a name written a mystery: Babylon the Great, the Mother of Harlots and of all the abominations of the earth: And I saw the woman drunken with the blood of the saints, and with the blood of the martyrs of Jesus.'}, page_content='Red'), Document(metadata={'context': 'Protestants in general reject the veneration and invocation of the Saints.:1174 Protestants typically hold that Mary was the mother of Jesus, but was an ordinary woman devoted to God. Therefore, there is virtually no Marian veneration, Marian feasts, Marian pilgrimages, Marian art, Marian music or Marian spirituality in today\'s Protestant communities. Within these views, Roman Catholic beliefs and practices are at times rejected, e.g., theologian Karl Barth wrote that "the heresy of the Catholic Church is its Mariology".'}, page_content='Mary_(mother_of_Jesus)'), Document(metadata={'context': 'Despite Martin Luther\'s harsh polemics against his Roman Catholic opponents over issues concerning Mary and the saints, theologians appear to agree that Luther adhered to the Marian decrees of the ecumenical councils and dogmas of the church. He held fast to the belief that Mary was a perpetual virgin and the Theotokos or Mother of God. Special attention is given to the assertion that Luther, some three-hundred years before the dogmatization of the Immaculate Conception by Pope Pius IX in 1854, was a firm adherent of that view. Others maintain that Luther in later years changed his position on the Immaculate Conception, which, at that time was undefined in the Church, maintaining however the sinlessness of Mary throughout her life. For Luther, early in his life, the Assumption of Mary was an understood fact, although he later stated that the Bible did not say anything about it and stopped celebrating its feast. Important to him was the belief that Mary and the saints do live on after death. "Throughout his career as a priest-professor-reformer, Luther preached, taught, and argued about the veneration of Mary with a verbosity that ranged from childlike piety to sophisticated polemics. His views are intimately linked to his Christocentric theology and its consequences for liturgy and piety." Luther, while revering Mary, came to criticize the "Papists" for blurring the line, between high admiration of the grace of God wherever it is seen in a human being, and religious service given to another creature. He considered the Roman Catholic practice of celebrating saints\' days and making intercessory requests addressed especially to Mary and other departed saints to be idolatry. His final thoughts on Marian devotion and veneration are preserved in a sermon preached at Wittenberg only a month before his death:'}, page_content='Mary_(mother_of_Jesus)'), Document(metadata={'context': "The Qur'an relates detailed narrative accounts of Maryam (Mary) in two places, Qur'an 3:35–47 and 19:16–34. These state beliefs in both the Immaculate Conception of Mary and the Virgin birth of Jesus. The account given in Sura 19 is nearly identical with that in the Gospel according to Luke, and both of these (Luke, Sura 19) begin with an account of the visitation of an angel upon Zakariya (Zecharias) and Good News of the birth of Yahya (John), followed by the account of the annunciation. It mentions how Mary was informed by an angel that she would become the mother of Jesus through the actions of God alone."}, page_content='Mary_(mother_of_Jesus)')]
Define QA Agent¶
In [85]:
from langchain.chat_models import ChatOpenAI
from langchain.chains.conversation.memory \
import ConversationBufferWindowMemory
from langchain.chains import RetrievalQA
# OpenAI LLM
llm = ChatOpenAI(openai_api_key = api_key,
model_name = 'gpt-3.5-turbo',
temperature = 0.0)
# conversational memory
conv_mem = ConversationBufferWindowMemory(
memory_key = 'chat_history',
k = 5,
return_messages =True)
# retrieval qa
qa = RetrievalQA.from_chain_type(
llm = llm,
chain_type = "stuff",
retriever = vectorstore.as_retriever())
# https://python.langchain.com/en/latest/modules/chains/index_examples/question_answering.html
# https://docs.langchain.com/docs/components/chains/index_related_chains
/tmp/ipykernel_91935/3167501936.py:9: LangChainDeprecationWarning: The class `ChatOpenAI` was deprecated in LangChain 0.0.10 and will be removed in 1.0. An updated version of the class exists in the :class:`~langchain-openai package and should be used instead. To use it run `pip install -U :class:`~langchain-openai` and import as `from :class:`~langchain_openai import ChatOpenAI``. llm = ChatOpenAI(openai_api_key = api_key, /tmp/ipykernel_91935/3167501936.py:14: LangChainDeprecationWarning: Please see the migration guide at: https://python.langchain.com/docs/versions/migrating_memory/ conv_mem = ConversationBufferWindowMemory(
Invoking Retrieval QA¶
In [86]:
query = "When was university of notredame establish"
qa.run(query) # retrieving the info
/tmp/ipykernel_91935/2341944140.py:2: LangChainDeprecationWarning: The method `Chain.run` was deprecated in langchain 0.1.0 and will be removed in 1.0. Use :meth:`~invoke` instead. qa.run(query) # retrieving the info
Out[86]:
'The University of Notre Dame was established on November 26, 1842.'
In [87]:
query = "who established the university of notredame"
qa.run(query)
Out[87]:
'The University of Notre Dame was established by the Congregation of Holy Cross, a Catholic religious order also known as the Holy Cross Fathers.'
In [88]:
from langchain.agents import Tool
tools = [
Tool(
name = 'Knowledge Base',
func = qa.run,
description = ('use this when answering based on knwowledge')
)
]
In [89]:
from langchain.agents import initialize_agent
from langchain.agents import AgentType
agent = initialize_agent(
agent=AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION,
tools=tools,
llm=llm,
verbose=True,
max_iterations=3,
early_stopping_method='generate',
memory=conv_mem
)
/tmp/ipykernel_91935/2874294646.py:4: LangChainDeprecationWarning: LangChain agents will continue to be supported, but it is recommended for new use cases to be built with LangGraph. LangGraph offers a more flexible and full-featured framework for building agents, including support for tool-calling, persistence of state, and human-in-the-loop workflows. See LangGraph documentation for more details: https://langchain-ai.github.io/langgraph/. Refer here for its pre-built ReAct agent: https://langchain-ai.github.io/langgraph/how-tos/create-react-agent/ agent = initialize_agent(
In [90]:
agent("when was university of notredame established") # chat gpt kind
/tmp/ipykernel_91935/3694372788.py:1: LangChainDeprecationWarning: The method `Chain.__call__` was deprecated in langchain 0.1.0 and will be removed in 1.0. Use :meth:`~invoke` instead. agent("when was university of notredame established") # chat gpt kind
> Entering new AgentExecutor chain... ```json { "action": "Knowledge Base", "action_input": "University of Notre Dame establishment date" } ``` Observation: The University of Notre Dame was established on November 26, 1842. Thought:```json { "action": "Final Answer", "action_input": "The University of Notre Dame was established on November 26, 1842." } ``` > Finished chain.
Out[90]:
{'input': 'when was university of notredame established', 'chat_history': [], 'output': 'The University of Notre Dame was established on November 26, 1842.'}
In [91]:
agent("who founded the university")
> Entering new AgentExecutor chain... ```json { "action": "Knowledge Base", "action_input": "University of Notre Dame founding" } ``` Observation: The University of Notre Dame was founded on November 26, 1842, by Rev. Edward Sorin, a French priest of the Congregation of Holy Cross. The university was established in Notre Dame, Indiana, USA. Thought:```json { "action": "Final Answer", "action_input": "The University of Notre Dame was founded on November 26, 1842, by Rev. Edward Sorin, a French priest of the Congregation of Holy Cross. The university was established in Notre Dame, Indiana, USA." } ``` > Finished chain.
Out[91]:
{'input': 'who founded the university', 'chat_history': [HumanMessage(content='when was university of notredame established', additional_kwargs={}, response_metadata={}), AIMessage(content='The University of Notre Dame was established on November 26, 1842.', additional_kwargs={}, response_metadata={})], 'output': 'The University of Notre Dame was founded on November 26, 1842, by Rev. Edward Sorin, a French priest of the Congregation of Holy Cross. The university was established in Notre Dame, Indiana, USA.'}
In [92]:
agent("20+6")
> Entering new AgentExecutor chain... ```json { "action": "Final Answer", "action_input": "26" } ``` > Finished chain.
Out[92]:
{'input': '20+6', 'chat_history': [HumanMessage(content='when was university of notredame established', additional_kwargs={}, response_metadata={}), AIMessage(content='The University of Notre Dame was established on November 26, 1842.', additional_kwargs={}, response_metadata={}), HumanMessage(content='who founded the university', additional_kwargs={}, response_metadata={}), AIMessage(content='The University of Notre Dame was founded on November 26, 1842, by Rev. Edward Sorin, a French priest of the Congregation of Holy Cross. The university was established in Notre Dame, Indiana, USA.', additional_kwargs={}, response_metadata={})], 'output': '26'}